Data Bootcamp Final Project -- Fund Evaluation using Python

Data Bootcamp UG 2016 Spring

Author: Ruohao Xin NYU Stern Class of 2016

Project motivation: It is inefficient to perform fund evaluation with Excel. I want to automate most of the basic analysis & regression with Python.

In this notebook, I am using XOP (SPDR Oil & Gas Production ETF)as an example.

I thought it is interesting as an example given the high volatility in oil prices over the past year.

At the end of the analysis, we can see if it is valuable to invest in oil related industry between the selected period.

Data Source:

Yahoo Finance for ETF information
Ken French Library for fama-french U.S equity factors

Project Reference: Data Bootcamp Github by Professor Backus and Statsmodel notebook by Brian Leblanc



In [42]:

    
#installing packages to extract prices from Yahoo
import pandas as pd
from pandas.io.data import DataReader
import datetime as dt
import matplotlib as mpl

print('Pandas version: ', pd.__version__)
print('Matplotlib version: ', mpl.__version__)
print('Today: ', dt.date.today())

%matplotlib inline









    



Pandas version:  0.17.1
Matplotlib version:  1.5.1
Today:  2016-05-11

Read in data from Yahoo Finance



In [59]:

    
start = dt.datetime(2012, 1, 1)  
end = dt.datetime(2015, 12, 31)    
tickers = ['XOP','SPY']   
# the time period need to long enough for the regression to work
# SPY is the S&P 500 index ETF, I am using it as a proxy for market performance over the same period
# I am using 3 years(36 months) of data from XOP and SPY, a lot of other ETFs don't go back longer than 3 years

# if first argument is a list, this returns a panel
p = DataReader(tickers, 'yahoo', start, end)

df = p.to_frame().unstack()['Adj Close'] #unpack the data and only save the "Adj Close" column
dm = df.resample('MS')                # this converts the data into first day of each month using resampling                    

dmpct = dm.pct_change().shift(-1) #to fix the mismatch issue of the pct_change function, shift all the result up 1 unit
dmpct1 = dmpct.round(4)*100       #multiplication to make it consistent with fama french factors  
dmpct1.head(5)                    #Now we have the fund/ETF monthly return ready to go.



In [61]:

    
#Basic visualization of the fund data -- daily price
dm.plot(kind='line',figsize = (10,6), title = 'Daily Price')
#now this chart doesn't really tell you anything useful as the price of two ETF deviates









    Out[61]:





<matplotlib.axes._subplots.AxesSubplot at 0x1c5ada23b38>



In [60]:

    
#Basic visualization of the fund data -- monthly return
dmpct1.plot(kind='line',figsize = (10,6), title='Monthly Return') 
#the monthly return chart is much more informative as it eliminates the level effect of prices 
#as you can see, XOP(green)is a lot more volatile comparing to the market return









    Out[60]:





<matplotlib.axes._subplots.AxesSubplot at 0x1c5ad9e1940>



In [65]:

    
dmpct1.plot(kind='hist',subplots=True, figsize = (8,4),bins = 25,title="histograph for SPY & XOP")
# we can use histogram to see the distribution of returns for the S&P 500 comparing to the fund we are using
# It is clear that XOP has more losses than S&P 500 ...not a good sign if you want to make money









    Out[65]:





array([<matplotlib.axes._subplots.AxesSubplot object at 0x000001C5AEE80438>,
       <matplotlib.axes._subplots.AxesSubplot object at 0x000001C5AF033400>], dtype=object)

Basic performance analysis



In [49]:

    
print(dmpct1.mean())  #average return for the fund
print(dmpct1.std())   #average standard deviation for the fund









    



SPY    1.171915
XOP   -0.791915
dtype: float64
SPY    2.262183
XOP    7.251199
dtype: float64

Read in data from Ken French data library

Now we need to perform a basic factor regression to see how the ETF correlates with the factors (Market, SMB, HML)

Market(Market - Rf) --> excess return on the market

SMB: (Small minue Big) --> Trade on small firms versus big firms in market cap

HML: (High minus Low) --> Trade on high B/M ratio companies versus low B/M companies



In [50]:

    
ff = DataReader('F-F_Research_Data_Factors', 'famafrench')[0]
ff.columns = ['Market', 'SMB', 'HML', 'Rf'] #rename the columns to make it easier to understand 
ff.head()

ff1=ff[ff.index >= 201201]  #this is to keep the time series of factors and ETF returns consistent 
ff1.head()

Linear Regression



In [51]:

    
import statsmodels.formula.api as smf  #introduce the regression package 
ff1.index = range(1,len(ff1) + 1) 
dmpct1.index = range(1,len(dmpct1) + 1) #I need to reset the index of both to merge the databases corrrectly 
combine = pd.concat([dmpct1, ff1], axis=1) #This step allows me to have the returns and factors alligned nicely
combine.head()



In [52]:

    
combine['Excess'] = combine['XOP']- combine['Rf']  
# After combining two dataset, I need to get excess market return for the given fund/ETF
combine.head()



In [53]:

    
model = smf.ols(formula = 'Excess ~ Market + SMB + HML', data = combine).fit()     #Fits the linear regression model
print(model.summary())  #Prints off a nice summary!









    



                            OLS Regression Results                            
==============================================================================
Dep. Variable:                 Excess   R-squared:                       0.061
Model:                            OLS   Adj. R-squared:                 -0.005
Method:                 Least Squares   F-statistic:                    0.9308
Date:                Wed, 11 May 2016   Prob (F-statistic):              0.434
Time:                        17:07:57   Log-Likelihood:                -157.82
No. Observations:                  47   AIC:                             323.6
Df Residuals:                      43   BIC:                             331.0
Df Model:                           3                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [95.0% Conf. Int.]
------------------------------------------------------------------------------
Intercept     -1.0319      1.155     -0.894      0.376        -3.360     1.297
Market         0.2609      0.351      0.744      0.461        -0.446     0.968
SMB            0.3192      0.503      0.635      0.529        -0.694     1.333
HML            0.8241      0.589      1.400      0.169        -0.363     2.011
==============================================================================
Omnibus:                        1.327   Durbin-Watson:                   1.768
Prob(Omnibus):                  0.515   Jarque-Bera (JB):                0.552
Skew:                          -0.080   Prob(JB):                        0.759
Kurtosis:                       3.506   Cond. No.                         3.72
==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.



In [54]:

    
result = pd.DataFrame(model.params)
result.columns = ['XOP']
result

Calculate Sharpe Ratio

The sharpe ratio is commonly used to measure risk-adjusted return for given asset

Sharpe ratio = (Mean asset return − Risk-free rate)/Standard deviation of asset return

It measure the excess return per unit of additional risk. It can be positive, negative or zero(risk free) given the underlying asset.



In [55]:

    
print('Average excess return: ',combine['Excess'].mean())
print('Standard deviation: ',combine['Excess'].std())
Sharpe = combine['Excess'].mean()/combine['Excess'].std()  #we are using average excess return on the fund as calculated above
print ('Sharpe Ratio: ',Sharpe)









    



Average excess return:  -0.7931914893617021
Standard deviation:  7.251014953359222
Sharpe Ratio:  -0.10939040871709076



In [56]:

    
result.loc['Sharpe Ratio']= Sharpe    # now we can use loc to add sharpe ratio into our regression result dataframe
result









    Out[56]:






  
    
      
      XOP
    
  
  
    
      Intercept
      -1.031915
    
    
      Market
      0.260921
    
    
      SMB
      0.319233
    
    
      HML
      0.824092
    
    
      Sharpe Ratio
      -0.109390

Conclusion

For XOP:

This ETF is not a very good investment as indicated by the negative sharpe ratio and -1% alpha.

A negative sharpe ratio would suggest that you are better off investing in a risk free bond than putting your money into the ETF.

However, the ETF may still have diverfication value in a portfolio context given its low correlation to the overall market.

For the project:

The code above can efficiently run fund regression one at a time. The next step would be to re-model the code into a loop strcture so it can reads in a list of tickers and perform the same function repeatedly. It will provide a similar table like the final result above but with more columns.

The idea is that you can quickly screen a set of mutual funds with similar strategy and investment scope. The result table would allow you to screen for the best performing fund in the subset easily.

More on the subject:

Theoractically, this can also be replicated for the Chinese fund analysis without the use of proprietary database CSMARS.

You can extract mutual fund data from: Chinese Securities Regulatory Commission Fund Disclosure with a web-scraping loop.

I cannot perform the analysis because I didn't know how to constrcut the fama-french factors for Chinese stock market

	SPY	XOP
Date
2012-01-01	4.15	7.54
2012-02-01	2.91	0.27
2012-03-01	-0.07	-7.53
2012-04-01	-3.08	-6.84
2012-05-01	-1.13	-7.17

	Market	SMB	HML	Rf
201201	5.05	2.51	-2.14	0.00
201202	4.42	-1.65	0.01	0.00
201203	3.11	-0.27	-0.06	0.00
201204	-0.85	-0.63	-0.20	0.00
201205	-6.19	-0.11	0.08	0.01

	SPY	XOP	Market	SMB	HML	Rf
1	4.15	7.54	5.05	2.51	-2.14	0.00
2	2.91	0.27	4.42	-1.65	0.01	0.00
3	-0.07	-7.53	3.11	-0.27	-0.06	0.00
4	-3.08	-6.84	-0.85	-0.63	-0.20	0.00
5	-1.13	-7.17	-6.19	-0.11	0.08	0.01

	SPY	XOP	Market	SMB	HML	Rf	Excess
1	4.15	7.54	5.05	2.51	-2.14	0.00	7.54
2	2.91	0.27	4.42	-1.65	0.01	0.00	0.27
3	-0.07	-7.53	3.11	-0.27	-0.06	0.00	-7.53
4	-3.08	-6.84	-0.85	-0.63	-0.20	0.00	-6.84
5	-1.13	-7.17	-6.19	-0.11	0.08	0.01	-7.18

	XOP
Intercept	-1.031915
Market	0.260921
SMB	0.319233
HML	0.824092
Sharpe Ratio	-0.109390